Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges
نویسندگان
چکیده
Decoding DNA symbols using next-generation sequencers was a major breakthrough in genomic research. Despite the many advantages of next-generation sequencers, e.g., the high-throughput sequencing rate and relatively low cost of sequencing, the assembly of the reads produced by these sequencers still remains a major challenge. In this review, we address the basic framework of next-generation genome sequence assemblers, which comprises four basic stages: preprocessing filtering, a graph construction process, a graph simplification process, and postprocessing filtering. Here we discuss them as a framework of four stages for data analysis and processing and survey variety of techniques, algorithms, and software tools used during each stage. We also discuss the challenges that face current assemblers in the next-generation environment to determine the current state-of-the-art. We recommend a layered architecture approach for constructing a general assembler that can handle the sequences generated by different sequencing platforms.
منابع مشابه
Genome assembly reborn: recent computational challenges
Research into genome assembly algorithms has experienced a resurgence due to new challenges created by the development of next generation sequencing technologies. Several genome assemblers have been published in recent years specifically targeted at the new sequence data; however, the ever-changing technological landscape leads to the need for continued research. In addition, the low cost of ne...
متن کاملPreliminary Studies on de novo Assembly with Short Reads
Recent development of next generation sequencing presents new computational challenges to assembly algorithms. Any effective and practical de novo assembly algorithm must confront issues of short read length, base-calling errors and enormous data size. In this report we present our effort to address these challenges in de novo assembly with short reads. Specifically we show that quality scores ...
متن کاملHybrid algorithms for Job shop Scheduling Problem with Lot streaming and A Parallel Assembly Stage
In this paper, a Job shop scheduling problem with a parallel assembly stage and Lot Streaming (LS) is considered for the first time in both machining and assembly stages. Lot Streaming technique is a process of splitting jobs into smaller sub-jobs such that successive operations can be overlapped. Hence, to solve job shop scheduling problem with a parallel assembly stage and lot streaming, deci...
متن کاملPaired de Bruijn Graphs: A Novel Approach for Incorporating Mate Pair Information into Genome Assemblers
The recent proliferation of next generation sequencing with short reads has enabled many new experimental opportunities but, at the same time, has raised formidable computational challenges in genome assembly. One of the key advances that has led to an improvement in contig lengths has been mate pairs, which facilitate the assembly of repeating regions. Mate pairs have been algorithmically inco...
متن کاملDIME: A Novel Framework for De Novo Metagenomic Sequence Assembly
The recently developed next generation sequencing platforms not only decrease the cost for metagenomics data analysis, but also greatly enlarge the size of metagenomic sequence datasets. A common bottleneck of available assemblers is that the trade-off between the noise of the resulting contigs and the gain in sequence length for better annotation has not been attended enough for large-scale se...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 9 شماره
صفحات -
تاریخ انتشار 2013